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A general method for obtaining moment inequalities for functions 
of independent random variables is presented. It is a generalization 
of the entropy method which has been used to derive concentration 
inequalities for such functions [Boucheron, Lugosi and Massart Ann. 

Probab. 31 (2003) 1583-1614], and is based on a generalized ten- 
sorization inequality due to Latala and Oleszkiewicz [Lecture Notes in 
Math. 1745 (2000) 147-168]. The new inequalities prove to be a versa¬ 
tile tool in a wide range of applications. We illustrate the power of the 
method by showing how it can be used to effortlessly re-derive classi¬ 
cal inequalities including Rosenthal and Kahane-Khinchine-type in¬ 
equalities for sums of independent random variables, moment inequal¬ 
ities for suprema of empirical processes and moment inequalities for 
Rademacher chaos and [/-statistics. Some of these corollaries are ap¬ 
parently new. In particular, we generalize Talagrand’s exponential 
inequality for Rademacher chaos of order 2 to any order. We also dis¬ 
cuss applications for other complex functions of independent random 
variables, such as suprema of Boolean polynomials which include, as 
special cases, subgraph counting problems in random graphs. 

1. Introduction. During the last twenty years, the search for upper bounds 
for exponential moments of functions of independent random variables, that 
is, for concentration inequalities, has been a flourishing area of probability 
theory. Recent developments in random combinatorics, statistics and empir- 
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ical process theory have prompted the search to moment inequalities dealing 
with possibly nonexponentially integrable random variables. 

Paraphrasing Talagrand in [41], we may argue that 

While Rosenthal-Pinelis inequalities for higher moments of sums of indepen¬ 
dent random variables are at the core of classical probabilities, there is a need 
for new abstract inequalities for higher moments of more general functions of 
many independent random variables. 

The aim of this paper is to provide such general-purpose inequalities. 
Our approach is based on a generalization of Ledoux’s entropy method (see 
[26, 28]). Ledoux’s method relies on abstract functional inequalities known 
as logarithmic Sobolev inequalities and provides a powerful tool for deriving 
exponential inequalities for functions of independent random variables; see 
[6, 7, 8, 14, 30, 31, 36] for various applications. To derive moment inequal¬ 
ities for general functions of independent random variables, we elaborate 
on the pioneering work of Latala and Oleszkiewicz [25] and describe so- 
called 0-Sobolev inequalities which interpolate between Poincare’s inequal¬ 
ity and logarithmic Sobolev inequalities (see also [4] and Bobkov’s arguments 
in [26]). 

This paper proposes general-purpose inequalities for polynomial moments 
of functions of independent variables. Many of the results parallel those 
obtained in [7] for exponential moments, based on the entropy method. In 
fact, the exponential inequalities of [7] may be obtained (up to constants) 
as corollaries of the results presented here. 

Even though the new inequalities are designed to handle very general 
functions of independent random variables, they prove to be surprisingly 
powerful in bounding moments of well-understood functions such as sums 
of independent random variables and suprema of empirical processes. In 
particular, we show how to apply the new results to effortlessly re-derive 
Rosenthal and Kahane-Khinchine-type inequalities for sums of independent 
random variables, Pinelis’ moment inequalities for suprema of empirical pro¬ 
cesses and moment inequalities for Rademacher chaos. Some of these corol¬ 
laries are apparently new. Here we mention Theorem 14 which generalizes 
Talagrand’s (upper) tail bound [40] for Rademacher chaos of order 2 to 
Rademacher chaos of any order. We also provide some other examples such 
as suprema of Boolean polynomials which include, as special cases, subgraph 
counting problems in random graphs. 

The paper is organized as follows. In Section 2, we state the main results 
of this paper. Theorems 2-4, as well as a number of corollaries. The proofs 
of the main results are given in Sections 4 and 5. In Section 4, abstract 
0-Sobolev inequalities which generalize logarithmic Sobolev inequalities are 
introduced. These inequalities are based on a “tensorization property” of 
certain functionals called (/>-entropies. The tensorization property is based 
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on a duality formula, stated in Lemma 1. In Appendix A.l, some further 
facts are gathered about the tensorization property of iji-entropies. 

In Section 6, the main theorems are applied to sums of independent ran¬ 
dom variables. This leads quite easily to suitable versions of Marcinkiewicz’s, 
Rosenthal’s and Pinelis’ inequalities. In Section 7, Theorems 2 and 3 are ap¬ 
plied to suprema of empirical processes indexed by possibly nonbounded 
functions, leading to a version of an inequality due to Gine, Latala and 
Zinn [16] with explicit and reasonable constants. In Section 8, we derive mo¬ 
ment inequalities for conditional Rademacher averages. In Section 9, a new 
general moment inequality is obtained for Rademacher chaos of any order, 
which generalizes Talagrand’s inequality for Rademacher chaos of order 2. 
We also give a simple proof of Bonami’s inequality. 

In Section 10, we consider suprema of Boolean polynomials. Such prob¬ 
lems arise, for example, in random graph theory where an important special 
case is the number of small subgraphs in a random graph. 

Some of the routine proofs are gathered in the Appendix. 

2. Main results. 


2.1. Notation. We begin by introducing some notation used throughout 
the paper. Let Xi,... ,Xn denote independent random variables taking val¬ 
ues in some measurable set A. Denote by Af the vector of these n random 
variables. Let F: A"' ^ M be some measurable function. We are concerned 
with moment inequalities for the random variable 


Z = F{Xi,...,Xn). 


Throughout, ]E[A] denotes expectation of Z and E[A| A] denotes conditional 
expectation with respect to T. A[,...,A^ denote independent copies of 
Ai,..., A„, and we write 

Z' = F(Ai,..., Ai_i, A', W+i,..., A„). 


Define the random variables V~^ and V by 


A+ =E 


^(Z-Z'fjXl^ 


2=1 


and 


A" =E 


^(Z-Z')2|Ar 

_ 2=1 


where x+ = max(x, 0) and x_ = max(—x, 0) denote the positive and negative 
parts of a real number x. The variables V~^ and V~ play a central role 
in [7]. In particular, it is shown in [7] that the moment generating function 






4 


BOUCHERON, BOUSQUET, LUGOSI AND MASSART 


of Z — EZ may be bounded in terms of the moment generating functions of 
and V-. The main results of the present paper relate the moments of Z 
to lower-order moments of these variables. 

In the sequel, Zi will denote an arbitrary measurable function Fi of = 
Xi ,..., ,..., Xn 1 that is, 

Zi = Fi {Xi ,..., Xj_i, Xj+i,..., Xn)- 


Finally, define 


V = Y,{Z-Zi)\ 

i=l 

Throughout the paper, the notation \\Z\\q is used for 

\\Z\\q = {E[\Z\^]f/\ 
where g is a positive number. 

Next we introduce two constants used frequently in the paper. Let 


K = 




< 1.271. 


2(v/i-l) 

Let Ki = 1 and for any integer q>2, define 

1 / / 

Kr, = - 1- 1--] 


qJ 

Then {Kg) increases to k as g goes to infinity. Also, define 


K = 


1 


<0.935. 


e - <e 


2.2. Basic theorems. Recall first one of the first general moment inequal¬ 
ities, proved by Efron and Stein [15], and further improved by Steele [37]: 


Proposition 1 (Efron-Stein inequality). 


Var[Z] < iE 


2 = 1 


Note that this inequality becomes an equality if F is the sum of its argu¬ 
ments. Generalizations of the Efron-Stein inequality to higher moments of 
sums of independent random variables have been known in the literature as 
Marcinkiewicz’s inequalities (see, e.g., [13], page 34). Our purpose is to de¬ 
scribe conditions under which versions of Marcinkiewicz’s inequalities hold 
for general functions F. 






MOMENT INEQUALITIES 


5 


In [7], inequalities for exponential moments of Z are derived in terms of the 
behavior of and V~. This is quite convenient when exponential moments 
of Z scale nicely with n. In many situations of interest this is not the case, 
and bounds on exponential moments of roots of Z rather than bounds on 
exponential moments of Z itself are obtained (e.g., the triangle counting 
problem in [7]). In such situations, relating the polynomial moments of Z 
to F+, y- or V may prove more convenient. 

In the simplest settings, 17+ and V~ are bounded by a constant. It was 
shown in [7] that in this case Z exhibits a sub-Gaussian behavior. Specifi¬ 
cally, it is shown in [7] that if 17+ < c almost surely for some positive constant 
c, then for any A > 0, 

jEgA(Z-E[Z]) < 

Our first introductory result implies sub-Gaussian bounds for the polynomial 
moments of Z: 

Theorem 1. If 17+ < c for some constant c > 0, then for all integers 

q>2, 

\\{Z-E[Z])+\\g<y^c. 

[Recall that K = l/{e — y/e) < 0.935.] If furthermore V~ < c, then for all 
integers q>2, 

\\Z\\q<E[Z]+2^^1y/l^. 

The main result of this paper is the following inequality. 

Theorem 2. For any real q>2, 

\\{Z - E[Z])+\\q < ^[l-}-^2Kqq\\V+\\,/2 

<^2Kq\\V+\\q/2 = V^q\\Vv^\\q 

and 

\\{Z - E[Z]).\\q< ^ (^l-^yKqq\\V-\\q/, 

<V2Kq\\V-\\q/2 = V^\\^\\q- 

Remark. To better understand our goal, recall Burkholder’s inequal¬ 
ities [9, 10] from martingale theory. Burkholder’s inequalities may be re¬ 
garded as extensions of Marcinkiewicz’s inequalities to sums of martingale 
increments. They are natural candidates for deriving moment inequalities 
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for a function Z = F{Xi,... ,Xn) of many independent random variables. 
The approach mimics the method of bounded differences (see [32, 33]) clas¬ 
sically used to derive Bernstein- or Hoeffding-like inequalities under similar 
circumstances. The method works as follows: let Xi denote the u-algebra 
generated by the sequence (X|). Then the sequence Mj = '&[Z\Xi] is an 
.Fj-adapted martingale (the Doob martingale associated with Z). Let {Z) 
denote the associated quadratic variation 

n 

(Z)=^(M,-M,_i)2, 

i=l 

let [Z] denote the associated predictable quadratic variation 

n 

[Z]=^E[(M,-M,_i)2|^,_i], 

i=l 

and let M be defined as maxi<j<„|Zj — Burkholder’s inequalities 

[9, 10] (see also [12], page 384) imply that for q>2, 

jjZ - E[Z]II, <(q- l)V||(Z )||,/2 = (g - 1)11 [[,. 

Note that the dependence on q in this inequality differs from the depen¬ 
dence in Theorem 2. It is known that for general martingales, Burkholder’s 
inequality is essentially unimprovable (see [10], Theorem 3.3). (However, for 
the special case of Doob martingale associated with Z this bound is per¬ 
haps improvable.) The Burkholder-Rosenthal-Pinelis inequality ([34], The¬ 
orem 4.1) implies that there exists a universal constant C such that 

||Z - E[Z]||, < C(Vg||[Z]||,/2 + g||M||,). 

If one has some extra information on the sensitivity of Z with respect to 
its arguments, such inequalities may be used to develop a strict analogue 
of the method of bounded differences (see [33]) for moment inequalities. 
In principle such an approach should provide tight results, but finding good 
bounds on the moments of the quadratic variation process often proves quite 
difficult. 

The inequalities introduced in this paper have a form similar to those 
obtained by resorting to Doob’s martingale representation and Burkholder’s 
inequality. But, instead of relying on the quadratic variation process, they 
rely on a more tractable quantity. Indeed, in many cases V~^ and V~ are 
easier to deal with than [Z] or (Z). 

Below we present two variants of Theorem 2 which may be more conve¬ 
nient in some applications. 
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Theorem 3. Assume that Zi< Z for all 1 <i <n. Then for any real 

g>2, 

||(Z - E[Z])+||, < VK,q\\V\\g/2 < ^^q\\V\U/2- 

Even though Theorem 2 provides some information concerning the growth 
of moments of {Z — E[Z])_, this information may be hard to exploit in 
concrete cases. The following result relates the moments of {Z — E[Z])_ 
with ||E^||q rather than with ||E“||q. This requires certain boundedness 
assumptions on the increments of Z. 


Theorem 4. If for some positive random variable M, 
{Z — Z')+ < M for every 1 <i <n, 
then for every real q>2, 

ll(Z - E[Z]U\g < VCM\\V+\U/2 V q\\M\\l), 
where Ci < 4.16. If, on the other hand, 

0 < Z — Zi < M for every 1 <i <n, 


then 

||(Z-E[Z])_||,<v'C2g(||E||,/2Vg||M||2), 

where C 2 <2.42. 


2.3. Corollaries. Next we derive some general corollaries of the main 
theorems which provide explicit estimates under various typical conditions 
on the behavior of V~ or V. 

The first corollary, obtained from Theorem 3, is concerned with function¬ 
als Z satisfying V < Z. Such functionals were at the center of attention in 
[6] and [36] where they were called self-bounded functionals. They encompass 
sums of bounded nonnegative random variables, suprema of nonnegative em¬ 
pirical processes, configuration functions in the sense of [39] and conditional 
Rademacher averages [7]; see also [14] for other interesting applications. 

Corollary 1. Assume that 0 < Z — Zi <1 for all i = 1,... ,n and that 
for some eonstant A>1, 

n 

()<Y^{Z - Zi)<AZ. 

i=l 



<E[Z]-\-A 


q-1 


Then for all integers g > 1 

(2.1) I 


2 
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and for every real q>2, then 

(2.2) \\{Z-E[Z])+\\g<^ VAqE[Z] + 

Moreover, for all integers q>2, 

\\{Z - E[Z])_llq < ^/CqA¥.[Z] 


2 


where C < 1.131. 


The next corollary provides a simple sub-Gaussian bound for the lower 
tail whenever V~ is bounded by a nondecreasing function of Z. A similar 
phenomenon was observed in ([7], Theorem 6). 

Corollary 2. Assume that V~ < g{Z) for some nondecreasing func¬ 
tion g. Then for all integers q> 2, 

\\{Z-nZ]U\,<VKqE[g{Z)]. 

Finally, the following corollary of Theorem 3 deals with a generalization 
of self-bounded functionals that was already considered in [7]. 

Corollary 3. Assume that Zi< Z for all i = 1,... ,n and V <WZ for 
a random variable VF > 0. Then for all reals q>2 and all 0 G (0,1], 

\\Z\\,<{l + e)E[Z] + '^(l + -^q\\W%. 

Also, 

\\(Z - E[Z])+\\g < V2Kq\\W\\,E[Z] + Kq\\W\\g. 

If M denotes a positive random variable sueh that for every l<i<n, 

0<Z - Zi<M, 

then we also have 

\\{Z - E[Z]U\, < ^/C 2 qmUmZ] + 2q\\W\\,) V q\\M\\l), 
where C 2 < 2.42 is as in Theorem 4. 

The proofs of Theorems 2-4 and of Corollaries 1-3 are developed in two 
steps. First, in Section 4, building on the modified (^-Sobolev inequalities 
presented in Section 3, generalized Efron-Stein-type moment inequalities 
are established. These modified (/>-Sobolev/Efron-Stein inequalities play a 
role similar to the one played by modified log-Sobolev inequalities in the 
entropy method in [26, 27, 28] and [30]. Second, in Section 5 these general 
inequalities are used as main steps of an inductive proof of the main results. 
This second step may be regarded as an analogue of what is called in [28] 
the Herbst argument of the entropy method. 
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3. Modified 0-Sobolev inequalities. The purpose of this section is to 
reveal some fundamental connections between (/>-entropies and modified cj)- 
Sobolev inequalities. The basic result is the duality formula of Lemma 1 
implying the tensorization inequality which is at the basis of the modified 
</>-Sobolev inequalities of Theorems 5 and 6. These theorems immediately 
imply the generalized Efron-Stein inequalities of Lemmas 3-5. 

3.1. (p-entropies, duality and the tensorization property. First we inves¬ 
tigate so-called “tensorization” inequalities due to Latala and Oleszkiewicz 
[25] and Bobkov (see [26]). As of the time of writing this text, Chafai [11] 
developed a framework for (/)-entropies and i?i)-Sobolev inequalities. 

We introduce some notation. Let denote the convex set of nonnegative 
and integrable random variables Z. For any convex function (p on M_|_, let 
the ^-entropy functional be defined for Z G L]*" by 

H4Z)=E[cP{Z)]-mZ]). 

Note that here and below we use the extended notion of expectation for a 
(not necessarily integrable) random variable X defined as E[A] = E[A_|_] — 
E[X_] whenever either or X_ is integrable. 

The functional is said to satisfy the tensorization property if for ev¬ 
ery finite family Xi, ... ,Xn of independent random variables and every 
(Xi,..., X„)-measurable nonnegative and integrable random variable Z, 

n 

H^Z) < ^E[E[(/.(Z)1X«] - 0(E[X1X«])]. 

i=l 

Observe that for n = 2 and setting Z = g (Xi, X 2 ), the tensorization property 
reduces to the Jensen-type inequality 

( 3 . 1 ) ■^'^(/< I H^{g{x,X2))dni{x), 

where /xi denotes the distribution of Xi. Next we show that (3.1) implies 
the tensorization property. Indeed let Yi be distributed like Xi, and Y 2 
be distributed like the (n — l)-tuple X 2 ,... ,X„. Let /ri and //2 denote the 
corresponding distributions. The random variable Z is a measurable function 
g of the two independent random variables Yi and Y 2 . By the Tonelli-Fubini 
theorem, 

Jj {^{g{yi,y2)) - (p(yj 9{y'i,y2)dgi{y'i^ 

+ 9iy'i^y2)dgi{yi)^ 

j 9(91,92) dtJ. 2 iy 2 )^'^ dp,i{yi)dg 2 iy 2 ) 
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<P{9{yi,y2)) - 4>{ / 9{yi,y2)dfii{y[] 


dni{yi) dy,2{y2) 


+ 


/ 9{y[,y2)dy,i{y[] 


I '^Ai2(y2) 

= J H4>{9{yi,y2)) dfi2{y2) + H^(^Jg{y[,Y2) dy,i{y[)^ 

< J H<|>i9iyl^y2))d^l2iy2) + J H^{g{y[,Y 2 ))dyi{y[), 

where the last step follows from the Jensen-type inequality (3.1). 
If we turn back to the original notation, we get 

H^{Z) < ]E[E[0(Z)|X(i)] - <^(E[Z|XW])] 

-L j [H^{Z{xi,X 2, ... ,Xn))]dgi{xi). 


Proceeding by induction, (3.1) leads to the tensorization property for ev¬ 
ery n. We see that the tensorization property for is equivalent to what 
we could call the Jensen property, that is, (3.1) holds for every gi, X 2 and 
g such that / g{x,X 2 ) dgi{x) is integrable. 

Let $ denote the class of functions (p which are continuous and convex 
on 1R+, twice differentiable on M^, and such that either cp is affine or (j)" is 
strictly positive and l/4>" is concave. 

It is shown in [25] (see also [26]) that there is a tight connection between 
the convexity of and the tensorization property. Also, (/> G $ implies the 
convexity of see [25]. However, this does not straightforwardly lead to 
Jensen’s property when the distribution gi in (3.1) is not discrete. (See Ap¬ 
pendix A.l for an account of the consequences of the convexity of (/>-entropy.) 

The easiest way to establish that for some function cj) the functional H^j, 
satisfies the Jensen-like property is by following the lines of Ledoux’s proof 
of the tensorization property for the “usual” entropy [which corresponds to 
the case 4>{x) = xlog(x)] and mimicking the duality argument used in one 
dimension to prove the usual Jensen’s inequality, that is, to express as 
a supremum of affine functions. 

Provided that </> G $, our next purpose is to establish a duality formula 
for ^-entropy of the form 


H^iZ) = snpE[MT)Z + MT)], 

TeT 

for convenient functions 'tpi and 'tp 2 on M_|_ and a suitable class of nonnegative 
variables T. Such a formula obviously implies the convexity of H^j) but also 
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Jensen’s property and therefore the tensorization property for Indeed, 
considering again Z as a function of Yi = Xi and Y 2 = {Xi,... ,Yn) and 
assuming that a duality formula of the above form holds, we have 




= sup 
Ter. 


= sup 
Ter. 


^i(T{y2)) I g{yi,y2)dfii{yi) + 'ilj2(T{y2)) dfi2{y2) 

(by Fubini) 

[^i(r(2/2))5'(2/i,y2) + V’2(T(y2))]d/i2(y2) ) dni{yi) 


- j [V’i(^(2/2))5(?/i,y2) + V’2(r(y2))]dfi2(y2)j d/ri(yi) 

= [ {H^{9{yi,Y2))) dyiiyi). 


Lemma 1. Let (j) £ ^ and Z £ L||^. If (j){Z) is integrable, then 

H^Z) = sup {EiicffT) - cffE[T])){Z -T) + m] - </.(E[r])}. 
TgL+.T^O 

Remark. This duality formula is almost identical to Proposition 4 in [11]. 
However, the proofs have different flavor. The proof given here is elementary. 


Proof. The case when (f is affine is trivial: equals zero, and so does 

the expression defined by the duality formula. 

Note that the expression within the brackets on the right-hand side equals 
H^{Z) for T = Z, so the proof of Lemma 1 amounts to checking that 

H^{Z) > E[{^\T) - cf'{E[T])){Z -T) + cf{T)] - </.(]E[r]) 

under the assumption that ^{Z) is integrable and T £ L]*". 

Assume first that Z and T are bounded and bounded away from 0. For 
any A £ [0, 1 ], we set Tx = (1 — A)Z + XT and 

/(A) = E[{f'{Tx) - cP'{E[Tx])){Z - Tx)] + H^Tx). 

Our aim is to show that / is nonincreasing on [0,1]. Noticing that Z — Tx = 
X{Z — T) and using our boundedness assumptions to differentiate under the 
expectation, we have 

/'(A) = -A[E[(Z - T)^cj)"{Tx)] - (E[Z - T])^iE[Tx])] 

+ E[i^'{Tx)-cP'{E[Tx])){Z-T)] 

+ E[(P'{Tx)iT - Z)] - (P\E[Tx])E[T - Z], 
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that is, 

/'(A) = -A[E[(Z - Tfct>"{Tx)\ - {E[Z - T]f(l)''{E[Tx])]. 


Now, by the Cauchy-Schwarz inequality. 


{E[Z-T]y= 

< E 


iZ-T)Vy\Tx) 

1 




E[{Z-Tyy'{Ty]. 


Using the concavity of lj4>" ■, Jensen’s inequality implies that 


E 


■ 1 ■ 


1 

“ r{E[Ty)’ 


which leads to 

(E[Z - r])2 < _J^E[(Z - r)V"(TA)], 

which is equivalent to f'{X) < 0 and therefore /(I) < /(O) = H^Z). This 
means that for any T, E[{(f)'{T) - cf)'{E[T])){Z - T)] + F^(T) < H^Z). 

In the general case we consider the sequences Zn = {Z W 1/n) An and 
Tk = {T y 1 /k) A k and our purpose is to take the limit, as k,n^ oo, in the 
inequality 


nyZn) > E[{y{Tk) - <P'{E[TkmZn - Tk) + 0(Tfc)] - 0(E[Tfc]), 


which we can also write as 


(3.2) E[y{Zn,Tk)] > -y{E[Tk])E[Zn-Tk]-^{E[Tk]) + cj){E[Zn]), 

where '<p{z,t) = 4>{z) — (p{t) — {z — Since we have to show that 

(3.3) El^iZ, T)] > -y{E[T])E[Z - T] - (/>(E[r]) + (/>(E[Z]) 

with ^ > 0, we can always assume ['tp{Z,T)] to be integrable [since otherwise 

(3.3) is trivially satisfied]. Taking the limit when n and k go to infinity on 
the right-hand side of (3.2) is easy, while the treatment of the left-hand side 
requires some care. Note that 'ip{z,t), as a function of t, decreases on (0, z) 
and increases on (z,-|-oo). Similarly, as a function of z, y{z,t) decreases on 
(0, t) and increases on (t, J-oo). Hence, for every t, ^{Zn, t) < ■0(1, t) + '4>{Z, t), 
while for every ^{zjTk) < ^{z, 1) -|- 'ip{z,T). Hence, given k, 

y{Zn,Tk) < yil,Tk)+y{Z,Tk), 

as ydz V 1/n) A n,Tk) —> y{z,Tk) for every 2 ;, we can apply the dominated 
convergence theorem to conclude that E[^(Z„,Tfc)] converges to E[V’(^, T),)] 
as n goes to inhnity. Hence we have the following inequality: 

(3.4) E[y{Z,Tk)] > -y{E[Tk])E[Z - Tk] - mn]) + 0(E[Z]). 














MOMENT INEQUALITIES 


13 


Now we also have ^p{Z,Tk) < and we can apply the dom¬ 

inated convergence theorem again to ensure that E[^(Z, T^)] converges to 
M[il;{Z,T)] as k goes to infinity. Taking the limit as k goes to infinity in (3.4) 
implies that (3.3) holds for every T,Z ^ such that is integrable and 
E[T] > 0. If Z 7 ^ 0 a.s., (3.3) is achieved for T = Z, while if Z = 0 a.s., it is 
achieved for T = 1 and the proof of the lemma is now complete in its full 
generality. □ 


Remark. Note that since the supremum in the duality formula of Lemma 
1 is achieved for T = Z (or T = 1 if Z = 0), the duality formula remains true 
if the supremum is restricted to the class 7^ of variables T such that 4>{T) 
is integrable. Hence the following alternative formula also holds: 

(3.5) H^Z) = sup {E[(,^'(r) - .^'(E[r]))(Z - T)] + H^T)}. 


Remark. The duality formula of Lemma 1 takes the following (known) 
form for the “usual” entropy [which corresponds to (l){x) =xlog(3:)]: 

Ent(Z) =sup{E[(log(r) -log(E[r]))Z]}, 

T 

where the supremum is extended to the set of nonnegative and integrable 
random variables T with E[r] > 0. Another case of interest is (l){x) = x^, 
where p e (1,2]. In this case, one has, by (3.5), 

H^{Z) = sup{pE[Z(TP-i - (E[r])?'-1)] - (p - i)R4r)}, 

T 

where the supremum is extended to the set of nonnegative variables in Lp. 


Remark. For the sake of simplicity we have focused on nonnegative 
variables and convex functions 4> on M_|_. This restriction can be avoided and 
one may consider the case where 0 is a convex function on M and define the 
(/>-entropy of a real-valued integrable random variable Z by the same formula 
as in the nonnegative case. Assuming this time that (p is differentiable on M 
and twice differentiable on M \ {0}, the proof of the duality formula above 
can be easily adapted to cover this case provided that l/cp" can be extended 
to a concave function on M. In particular, if (j){x) = |a;|^, where p G (1,2], one 
gets 


H^{Z) = sup<j pE 


'^(\T\p |E[r]p 
V T E[r] 


-(p-i)R^(r)|, 


where the supremum is extended to Lp. Note that for p = 2 this formula 
reduces to the classical one for the variance 


Var(Z) =sup{2Cov(Z,r) - Var(r)}, 
T 
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where the supremum is extended to the set of square integrable variables. 
This means that the tensorization inequality for the i?i-entropy also holds for 
convex functions (/> on M under the condition that is the restriction to 
M \ {0} of a concave function on M. 


3.2. From (p-entropies to p-Soholev inequalities. Recall that our aim is to 
derive moment inequalities based on the tensorization property of (/)-entropy 
for an adequate choice of the function p (namely, a properly chosen power 
function). 

As a training example, we show how to derive the Efron-Stein inequality 
cited in Proposition 1 and a variant of it from the tensorization inequality 
of the variance, that is, the (/>-entropy when p is defined on the whole real 
line as p{x) = Then 


Var(Z) < E 


n 

_ i=\ 


and since conditionally on X^'^\ Z[ is an independent copy of Z, one has 

E[(Z - E[Z|xW])^|xW] = iE[(Z - z';f\x^\ 

which leads to Proposition 1. A useful variant may be obtained by noticing 
that E[Z|X(*)] is the best -measurable approximation of Z in L 2 which 
leads to 

n 

(3.6) Var(Z)<^E[(Z-Zi)2] 

i=l 


for any family of square integrable random variables Z^s such that Zi is 
X b) -measurable. 

Next we generalize these symmetrization and variational arguments. The 
derivation of modified (/>-Sobolev inequalities will rely on the following prop¬ 
erties of the elements of <h. The proofs of Proposition 2 and Lemma 2 are 
given in Appendix A.l. 


Proposition 2. If p ^ then both p' and x {P{x) — p{0))/x are 
concave functions on (0,oo). 


Lemma 2. Let p be a continuous and convex function on M+. Then, 
denoting by p' the right derivative of p, for every Z G one has 

(3.7) H^{Z) = inf E[P{Z) - P{u) - {Z - u)p'{u)]. 


Let Z' be an independent copy of Z. Then 

H^{Z) < \E[{Z - Z'){P'{Z) - P'{Z’))] 
= E[{Z-Z')+{P'{Z)-P'{Z'))]. 


(3.8) 




MOMENT INEQUALITIES 


15 


If, moreover, {4>{x) — fi{0))/x is concave on then 

=E[(Z-Z')+(^(Z)-^(Z'))]. 

Note that by Proposition 2, we can apply (3.9) whenever (/> G <h. In partic¬ 
ular, for our target example where 4>{x) = x^, with p G (1,2], (3.9) improves 
on (3.8) within a factor p. 

Modihed ())-Sobolev inequalities follow then from the tensorization in¬ 
equality for ^-entropy, the variational formula and the symmetrization in¬ 
equality. The goal is to upper bound the (/>-entropy of a conveniently chosen 
convex function / of the variable of interest Z. The results crucially depend 
on the monotonicity of the transformation /. 

Theorem 5. Let Xi, ... ,Xn be independent random variables and let 
Z be an [Xi, ... ,Xn) -measurable random variable taking its values in an 
interval I. Let V, and {Zi)i<n be defined as in Section 2.1. 

Let 4> € ^ and let f be a nondecreasing, nonnegative and differentiable 
convex function on 2. Let denote the function x —> {fi{x) — 4>{0))/x. Then 

H^{f{Z)) < E[V+f'‘^{Z)fifif{Z))] if Ip o f is convex. 

On the other hand, if {Zi)i<n satisfy Zi< Z for all i<n, then 

H^fiZ)) < lE[Vf'\Z)fi"{f{Z))] iff of is convex. 

Proof. First fix x < y. Assume hrst that g = 4>' o f is convex. We first 
check that 

fiifiv)) - fiifix)) - ifiy) - f{x))fi'{f{x)) 

^ ^ <\{y-xff'‘^{y)ct>'\f{y)). 

Indeed, setting 

Kt) = 4>{f{y)) - fiifit)) - ifiy) - f(,t))g{t), 

we have 

h'{t) = -g'{t){f{y)-f{t)). 

But for every t <y, the monotonicity and convexity assumptions on / and g 
yield 

0 < g'{t) < g'(y) and 0 < f{y) - f{t) < {y - t)f{y), 


hence 


h'{t)<{y-t)f'{y)g\y). 
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Integrating this inequality with respect to t on [x,y] leads to (3.10). 

Under the assumption that ip o f is convex, 

0 < f{y) - fix) <iy- x)f{y) 

and 

0 < ipifiy)) - i’ifix)) <iy- x)f'iy)'ip'ifiy)), 

which leads to 

(3.11) ifiy) - fix))iipifiy)) - ipifix))) < (x - y)‘^f'‘^iy)'ip’ifiy)). 

Now the tensorization inequality combined with the variational inequality 
(3.7) from Lemma 2 and (3.10) lead to 

n 

H4fiZ)) < i Ee[(Z - Z,)^f\Z)<p'\fiZ))] 

i=l 

and therefore to the second inequality of the theorem. 

The hrst inequality of the theorem follows in a similar way from inequal¬ 
ity (3.9) and from (3.11). □ 

The case when / is nonincreasing is handled by the following theorem. 

Theorem 6. Let Xi,... ,Xn be independent random variables and let 
Z be an (Xi,... ,Xn)-measurable random variable taking its values in some 
interval I. Let (/>£<!> and let f be a nonnegative, nonincreasing and differen¬ 
tiable convex function onX. Let ip denote the function x ^ {(p{x) — cpiO))/x. 
For any random variable Z < mini<j<„ Zi, 

H^fiZ)) < 1E[U/'2(Z)0"(/(Z))] tfcP'of is convex, 
while if Ip ° f convex, we have 

774/(Z))<E[U+r(Z)V^'(/(Z))] 

and 

H^ifiZ))<E[V-f‘^iZ)f;>ifiZ))]. 

The proof of Theorem 6 parallels the proof of Theorem 5. It is included 
in Appendix A.l for the sake of completeness. 

Remark. As a first illustration, we may derive the modified logarithmic 
Sobolev inequalities in [7] using Theorems 5 and 6. Indeed, letting /(z) = 
exp(Az) and (pix) =xlog(x) leads to 

H4f{Z))<X^E[V+eM>^Z)], 
if A > 0, while if A < 0, one has 

i7^(/(Z))<A2E[U-exp(AZ)]. 
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4. Generalized Efron-Stein inequalities. The purpose of this section is 
to prove the next three lemmas which relate different moments of Z to V, 
17+ and V~. These lemmas are generalizations of the Efron-Stein inequality. 

Recall the definitions of {Xi), Z,{Zi),{Z',V~,V and the constants 
K and K, given in Section 2.1. 

Lemma 3. Let q>2 be a real number and let a satisfy q/2 < a < q — 1. 
Then 

E[(Z - E[Z])l] < E[{Z - - E[Z])^“^], 

E[(Z - E[Z])%] < E[(Z - E[Z])“]'?/“ + a{q - a)E[E+(Z - E[Z])X‘^] 

and 

E[(Z - E[Z])f] < E[(Z - E[Z])"]''/“ + a{q - a)E[V-(Z - E[Z]yS^]. 

Proof. Let q and a be chosen in such a way that l<q/2<a<q — 1. 
Let (j){x) = Applying Theorem 5 with f{z) = {z — E[Z])" leads to the 
first two inequalities. Finally, we may apply the third inequality of Theorem 6 
with f(z) = {z — E[Z])“ to obtain the third inequality of the lemma. □ 

The next lemma is a variant of Lemma 3 that may be convenient when 
dealing with positive random variables. 

Lemma 4. Let q denote a real number, q>2 and q/2 < a < q — 1. If for 
all i = 1,... ,n 

0 < Zi < Z a.s., 

then 

E[Z‘i] < E[Z“]''/" + 

Proof. The lemma follows by choosing q and a such that 1 < q/2 < 
a<q — l, taking (/{x) = and applying Theorem 5 with f{z) = z“. □ 

The third lemma will prove useful when dealing with lower tails. 

Lemma 5. If the increments Z — Zi or Z — Z[ are bounded by some 
positive random variable M, then 

E[{Z-E[Z])f] 

< E[{Z - E[Z])"]'^/" + - E[Z] - M)r^]. 


(4.1) 
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If the increments Z — Z[ are bounded by some positive random variable M, 
then 

E[{Z-ElZ])^] 

(4.2) 

< E[{Z - + a{q - a)E[V^{Z - E[Z] - 

Proof. If the increments Z — Zi or Z — Z'- are upper bounded by some 
positive random variable M, then we may also use the alternative bounds 
for the lower deviations stated in Theorem 6 to derive both inequalities. □ 

To obtain the main results of the paper, the inequalities of the lemmas 
above may be used by induction on the order of the moment. The details 
are worked out in the next section. 


5. Proof of the main theorems. We are now prepared to prove Theorems 
1-3 and Corollaries 1 and 3. 

To illustrate the method of proof on the simplest possible example, first we 
present the proof of Theorem 1 . This proof relies on a technical lemma proved 
in Appendix A.2. Recall from Section 2.1 that K is defined as l/(e — -v/e). 


Lemma 6. For all integers g > 4, the sequence 


q^Xq = 


is hounded by 1. Also, limg_>oo3:g = 1- 


\ I ^ \ 


K\q-l 


Proof of Theorem 1. To prove the first inequality, assume that < 
c. Let mq be defined by 


mq=\\{Z-E[Z])+\\q. 

For q>3, we obtain from the second inequality of Lemma 3, with a = q — 1, 

(5.1) m^<m^_i + c(g-l)m^l2. 

Our aim is to prove that 

(5.2) ml < {Kqcfl‘^ for q>2. 

To this end, we proceed by induction. For q = 2, note that by the Efron- 
Stein inequality, 

772,2 ^ 1E[P’''] < C 

and therefore (5.2) holds for q = 2. 
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Taking q = 3, since mi < m 2 < ^/c, we derive from (5.1) that 

This implies that (5.2) also holds for q = 3. 

Consider now q > 4 and assume that 

mj < \/Kjc 

for every j <q — 1. Then, it follows from (5.1) and two applications of the 
induction hypothesis that 

i^g /2 


q — l{\/q— 1 Y -—— l){-\/q — 2 )^ ^ 


= {Kqcy/^ 


q-iy/"^ ^q-l fq- 2 


q 


-) 


Kq 


q 




2 ^ 9 - 2)72 


K\q-1, 

The hrst part of the theorem then follows from Lemma 6 . 

To prove the second part, note that if, in addition, V~ < c, then applying 
the first inequality to —Z, we obtain 

\\{Z-E[Z]U\g<K^. 

The statement follows by noting that 

E[\Z - E[Z]|9] = E[{Z - E[Z])l] + E[{Z - E[Z]f_\ < 2{K^f. □ 

The proof of Theorems 2 and 3, given together below, is very similar to 
the proof of Theorem 1 above. 


Proof of Theorems 2 and 3. It suffices to prove the hrst inequality 
of Theorems 2 and 3 since the second inequality of Theorem 2 follows from 
the hrst by replacing Z by —Z. 

We intend to prove by induction on k that for all integers A: > 1, all 
5 G (/c, A; + 1], 

\\{Z -E[Z])+\\q< ^qKqCq, 

where either Cq = ||P||g/2vi or Cq = 2||y+||g/2vi(l - l/q)- 

For A = 1, it follows from Holder’s inequality, the Efron-Stein inequality 
and its variant (3.6) that 

||(Z - E[Z])+\\q < v/2p^l < ^2Kq\\V+\\iyq/2 

and 

||(Z - E[Z])+\\q < V||H||iv,/2 < VaC,||F||iv,/2. 
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Assume the property holds for all integers smaller than some > 1, and 
let us consider g G {k,k + V\. Holder’s inequality implies that for every non¬ 
negative random variable Y, 

E[y(z-E[z])r2]<||y||,/2||(z-E[z])+||^-2, 

hence, using the first and second inequalities of Lemma 3 with a = q — \, we 
get 

ii(z - nz])+\\i < ii(^ - nz])+\\u+\c,\\{z - nz])+\\V. 

Defining 

x, = ||(Z-E[Z])+||^(gK,c,)-'?/2, 

it suffices to prove that Xq<\. With this notation the previous inequality 
becomes 


T < <?/9-D -iXgA g/2 g/2 ,l l-2/q q/2 q/2 q/2-l 

Xqq Cq \q L) Cq_iKq_i 2Xq q ^q l^q ) 

from which we derive, since Cq-i < Cg and < Kg, 


x„< X 


9 / 9-1 


1^/2 

1__ 

qJ 


1 


2Kg ^ 


r-1-2/9 


— -^q-l 

Assuming, by induction, that Xg-i < 1, the previous inequality implies that 


I, < 1 1 - iV'""+Ai-;-"''’ 


qJ 


2Kg « 


Since the function 


fq-X 


^ q) 


1\9/2 


+ 




-x^-^/^-x 


is strictly concave on M+ and positive at a: = 0, /q(l) = 0 and fq{xg) > 0 
imply that Xg < 1 as desired. □ 


Proof of Theorem 4. We use the notation niq = \\{Z — EZ)_||g. For 
a > 0, the continuous function 

x^e-'/2 + —eVAi-l 

ax 

decreases from +00 to — 1 < 0 on (0,-t-oo). Define Ca as the unique 

zero of this function. 

Since Ci and C 2 are larger than 1/2, it follows from Holder’s inequality, 
the Efron-Stein inequality and its variant (3.6) that for q G [1,2], 


||(Z - nz])-\\q < < A 2 Atg||H+||iv ,/2 
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and 


||(Z - E[Z])_ II, < V||^||iv ,/2 < V^,niivg/ 2 . 

In the rest of the proof the two cases may be dealt with together. The 
first case, belonging to the first assumption of Theorem 4, corresponds to 
a = 1, while the second corresponds to o = 2. Thus, we define 

^ f ll^■^lllVQ /2 Vg||M||2, when 0 = 1, 

\ ll^llivg /2 Vg||M||2, when 0 = 2. 

For g > 2, either (4.2) or (4.1) with a = q — 1 implies 

(5.3) ml < ml_^ + qE[V^{{Z - EZ)_ + M)''”^] 
and 

(5.4) ml <ml_^ + |e[F((Z - EZ)_ + M)'^“^]. 

We first deal with the case q £ [2,3). By the subadditivity of x —> for 

q £ [2,3], we have 

{{Z - EZ)_ + + (Z - E[Z])r^ 

Using Holder’s inequality, we obtain from (5.3) and (5.4) that 

ml < ml_^ + q\\M\\l-^\\V+\\q/2 + g||^+||,/2"ir^ 

and 

m’ < m»_i + |||A/||3-"||y ||,/2 + |l|V'||,/2m’-=. 

Using the fact that m,_i < ^Cq-i < those two latter inequalities imply 


„2—5/2 

ml < + -Cqml~‘^. 

^ ^ a ^ a ^ 


Let Xq = ( y^gg )'^i then the preceding inequality translates into 


Xq < 




which in turn implies 




since q>2 and Cq > 1 . 
The function 


gq-x 


ir + -U(l + ^'W-. 


2Ca clC, 














22 


BOUCHERON, BOUSQUET, LUGOSI AND MASSART 


is strictly concave on M+ and positive at 0. Furthermore, 


ffg(l) 


4 + a 
2^ 


- 1 < 0 , 


since Ca > (4 + o) /2a. Hence Qq can be nonnegative at point Xq only if 
< 1, which settles the case g G [2,3]. 

We now turn to the case q>3. We will prove by induction on k>2 that 
for all q £ [k,k + l), niq < ^/qCaUqCg. By the convexity of x —> we have, 

for every 9 G (0,1), 


((Z-EZ)_ 


l^^ {Z-EZ). 


+ {1-9) 


M \ 
1^9) 


q-2 


< ^ _ 0yg+3(^z - E[Z])!r^. 


Using Holder’s inequality, we obtain from (5.3) and (5.4) that 

m? < mj-i + qll-+\M\\+-\\V+\\,i,_ + q(\ - e)-‘‘+'^\\V+\\,,qm+ 

and 

< mLi + |«"'+’’ll«lir’l|r|l,/2 + |(1 - ||,/2m|-^ 

Now assume by induction that < VCa{q — l)cq_i. Since Cq-i < Cq, 

we have 

< Cl^\q - l)''/2c^/2 + 15-9+20-9+3^9/2^9/2 + 1^(1 _ 0)-9+3 9-2^ 

Let Xq = Ca '^^^m9(5Cq)-9/2. Then it suffices to show that < 1 for all q> 2. 
Observe that 

/ 1 \ q/2 1 _ 

Xq<[l--j + ^(0-‘'+'(v^g)“''+' + (1 - 9)-^+^-^/<^). 

We choose 9 minimizing 

g{9) = r^+"(v^ 5 )-''+" + (1 - 0)-9+3, 


that is, 0 = l/{\/'C^q + 1). Since for this value of 9 


1 \ 9-2 


the bound on Xq becomes 


1X9/2 


Xq < 1- + 


qj aCa 


1 + 


1 \9-2 


V^aq 


1 + 


^ f^l-2lq _ 


1 + 


V^aq) 


1 ) 
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Hence, using the elementary inequalities 




we get 


qJ 


Xq < e 


VC-aQ 


q-2 


< e 


l/v^ 


Since the function 


fq-X 


=-1/2 




dC a 


— X 


is strictly concave on M+ and positive at 0 and Ca is defined in such a way 
that /g(l) = 0, fq can be nonnegative at Xq only if Xq < 1, which proves the 
theorem by induction. □ 

Proof of Corollary 1. Applying Lemma 4 with a = q — 1 leads to 


1 ^ 11 ^ < 11 ^ 11^-1 




But by assumption, we have V < AZ, and therefore 

Il7lli^ll7lp I II viig—1 
\\Z\\q < \\Z\\g_^ + —\\Z\\g_^ 

qA 


< II 7lP 


1 + 


nz\\q_u 

Since for any nonnegative real number u, 1 + ug < (1 + for u > 0, 

A 


\zrq<\\z\\l_,[l + 


AZ\\q-l 


or, equivalently. 


Z\\q < ||.Z^||o-l + —• 


Thus, \\Z\\q < \\Z\\i + {A/2){q — 1) by induction, and (2.1) follows. 
To prove (2.2), note first that by Theorem 3, 

\\{Z - EZ)+||g < VKg||H||g/2 < ^K,qA\\Z\\q/2. 

Let s be the smallest integer such that g/2 < s. Then (2.1) yields 
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SO that 


\\{Z-E[Z])+\\,<V^ 


I qAE[Z] + 




< 




2 


and inequality (2.2) follows. 

In order to prove the last inequality of Corollary 1, we first define C as 
the unique positive root of the equation 


2C 


We derive from the upper bound V < AZ and the modified Efron-Stein 
inequality (3.6) that 

(E|Z - EZ\f < E[{Z - EZf] < AEZ. 


Since C > 1, this proves the inequality for = 1 and q = 2. For q >3, we 
assume, by induction, that rrik < \/ CkAE[Z] for k = q —2 and k = q—1 and 
use V < AZ together with (4.1) with a = q — 1. This gives 

ml < ml_, + ^AE[Z{{Z - E[Z])_ + l)''-^]. 

Recall Chebyshev’s negative association inequality which asserts that if / is 
nondecreasing and g is nonincreasing, then 


E[fg]<E[f]E[g]. 

Since the function z ^ {{z — E[Z])_ + 1)'^“^ decreases, by Chebyshev’s neg¬ 
ative association inequality, the previous inequality implies 

ml < + ^AE[Z]E[{{Z - EZ)_ + 1)''-^]. 

Thus, this inequality becomes 

ml < -L |^E[Z](1 -L mq-2Y~‘^ 

and therefore our induction assumption yields 


ml < 




^CqAE[Z\ ^ V ^ q 


q-2 


Now we use the fact that since Z is nonnegative, niq < EZ. Then we may 
always assume that CqA < EZ, since otherwise the last inequality of Corol¬ 
lary 1 is implied by this crude upper bound. Combining this inequality with 
A> 1 leads to 

1 ^ 1 

^CqAE[Z\ - 
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so that plugging this in the inequality above and setting Xq = m'^{CqAEZ) 
we derive that 


Xq<[l- 


l \<//2 1 / I 


g-2 


+ 


2C\Cq 


+ \ 1 — 


Now we claim that 
(5.5) 


q-2 


Cq 


+ \ I — 




q 


Indeed, (5.5) may be checked numerically for q = 3, while for (7 > 4, combin¬ 
ing 


2 11 

'!__<! -^ 

q q 2q^ 


with ln(l + u) <u leads to 


In 


IS 


q-2- 


+ \ 1 — 


^ . 1 1/3 1 2 

G q\2 q C 
1 1/7 2\ 

^-' + c + ^U-c)' 


which, since C <8/7, implies (5.5). Hence 


Xq<\l- 


1 --) ^ + J_e-i+i/C'< g-1/2 J_g-1+1/C^ 


qJ 


2C 


2C 


which, by definition of C, means that Xg < 1, completing the proof of the 
third inequality. □ 


Proof of Corollary 2. This corollary follows by noting that if V 
is bounded by a nondecreasing function of Z, then by negative association, 

E[p-(Z-E[Z])r^] 

<E[g(Z)(Z-ElZ])r^] 

<E[g(Z)]E[(Z-ElZ])r^]. 

Thus, writing 

mg=||(Z-E[Z])_||g, 

we have 

ml < -L E[g{Z)]{q - 
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This recursion is identical to the one appearing in the proof of Theorem 1, 
so the rest of the proof is identical to that of Theorem 1. □ 

Proof of Corollary 3. Let g be a number, q>2. Let 6 > 0. Then 


||(Z -E[Z])+||q <VK.q\\WZ\\q/2 (by Theorem 3) 


< V «:(?||Z||q||lT||g (by Holder’s inequality) 



9 


[for 0 > 0, since < (a^ + b‘^)/2 for a, 6 > 0]. 


Now Z > 0 implies that || (Z — E[Z])_||q < K[Z] and we have \\Z\\q < K[Z] + 
II (Z — E[Z])+||q. Hence, for 0 < 0 < 1, 




concluding the proof of the first statement. To prove the second inequality, 
note that 

||(Z-E[Z])+||, 

< VKq\\WZ\\q/2 (by Theorem 3) 

< VK( 7 ||VL||q||Z||q (by Holder’s inequality) 

< VK( 7 || VL||q(2E[Z] + «;(?||fT||g) (by the first inequality with 9 = 1) 

<V2Kq\\W\\qE[Z]+Kq\\W\\q, 

as desired. □ 

6. Sums of random variables. In this section we show how the results 
stated in Section 2 imply some classical moment inequalities for sums of in¬ 
dependent random variables such as the Khinchine-Kahane, Marcinkiewicz 
and Rosenthal inequalities. In all cases, the proof basically does not require 
any further work. Also, we obtain explicit constants which only depend on q. 
These constants are not optimal, though in some cases their dependence on 
q is of the right order. For more information on these and related inequalities 
we refer to [13]. 

The simplest example is the case of Khinchine’s inequality: 
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Theorem 7 (Khinchine’s inequality). Let ai,... ,an be nonnegative con¬ 
stants, and let Xi,...,Xn be independent Rademacher variables (i.e., with 
P{Xj = —1} = P{Xj = 1} = 1/2). If Z = then for any integer 

q>2, 


\\{z)+\\, = \\izu\,<VW^, 






and 


|ZL 




where K = l/{e — -y/e) < 0.935. 


Proof. We may use Theorem 1. Since 

n n n 

V+ = ^E[(a,(W - 4))i|W] = 2Y^alla,x,>o < 2^a^ 

2=1 2 = 1 2 = 1 

the result follows. □ 


Note also that using a symmetrization argument (see, e.g., [13], Lemma 1.2.6), 
Khinchine’s inequality above implies Marcinkiewicz’s inequality: if Xi ,..., X^ 
are independent centered random variables, then for any q>2, 




< 2'+'/V2A'9 

Q 




q/2 


The next two results are Rosenthal-type inequalities for sums of indepen¬ 
dent nonnegative and centered random variables. The following inequality 
is very similar to inequality (Hr) in [16] which follows from an improved 
Hoffmann-Jprgensen inequality of [24]. Note again that we obtain the result 
without further work. 


Theorem 8. Define 

n 

z = Y.Xi, 

2 = 1 

where Xi are independent and nonnegative random variables. Then for all 
integers g > 1 and 9 G (0,1), 


\\{Z -E[Z])+\\q < J2Kq 


max \Xi\ 

E[Z\ -\- Kq 

max \Xi\ 

i 

q 

i 


\\{Z-E[Z]U\q< KqY^ElXf] 
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and 



max Xi 

l<i<n 


Proof. We may use Corollary 3 to get the first and the third inequali¬ 
ties; just note that 


n 


V = Y,Xf <wz, 


where 


W = max Xi. 

l<i<n 

In order to get the second inequality, just observe that 


V-<Y,E[X% 


and apply Theorem 1 to —Z. □ 

Next we use the previous result to derive a Rosenthal-type inequality 
for sums of centered variables. In spite of the simplicity of the proof, the 
dependence of the constants on q matches the best known bounds. (See [35] 
which extends the theorem below for martingales.) 

Theorem 9. Let Xi, i = 1,... ,n, be independent centered random vari¬ 
ables. Define 


n 


Z = Y^X,, a2 = ^E[x2], y=max|W 


Then for any integer q>2 and 9 G (0,1), 



Proof. We nse Theorem 2. Note that 


=Y^x^+Y^nx^'\ 


Thus 


||(^)+||g < V2«:g||y+||g/2 (by Theorem 2) 
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< /^ e [ x / 2 ] + (1 + e)Y,E[xi] + 


I 


KQ 

T 


(i+j)r"ii,/2 


(by Theorem 8) 


— \/2Kq ^(2 + 6l)y^E[Xf] + ~^(^^ + ll^^lla/2 



□ 


7. Suprema of empirical processes. In this section we apply the results 
of Section 2 to derive moment bounds for suprema of empirical processes. In 
particular, the main result of this section, Theorem 12, may be regarded as 
an analogue of Talagrand’s inequality [40] for moments. Indeed, Talagrand’s 
exponential inequality may be easily deduced from Theorem 12 by bounding 
the moment generating function by bounding all moments. 

As a first illustration, we point out that the proof of Khinchine’s inequality 
in the previous section extends, in a straightforward way, to an analogous 
supremum: 

Theorem 10. Let T C M” be a set of vectors t = and let 

Ai,..., Xn he independent Rademacher variables. If Z = sup^g.^- iiHi; 
then for any integer q>2, 


n 



where K = l/{e — y/e) < 0.935, and 


n 


\\{Z -E[Z])_\\q < y/2Ciq sup Vtf V 2y/^q supjtij, 

where Ci is defined as in Theorem 4. 

Before stating the main result of the section, we mention the following 
consequence of Corollary 3. 

Theorem 11. Let J- be a countable class of nonnegative functions de¬ 
fined on some measurable set X. Let Xi,...,Xn denote a collection of X- 
valued independent random variables. Let Z = supjgjpX^i/(^i) 


M= max sup/(Aj). 

l<i<nf^jr 
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Then, for all q> 2 and 9 G (0, 2), 

\\Z\\q < (1 + 9)E[Z] + 


Next we introduce the relevant quantities for the statement and proof of 
our main theorem about moments of centered empirical processes. 

Let T denote a countable class of measurable functions from —> M. Let 
Xi,..., Xn denote independent valued random variables such that for all 
f € X and i = 1,..., n, E/(Xj) = 0. Let 

n 

Z = sup . 

i=l 

The fluctuations of an empirical process are known to be characterized by 
two quantities that coincide when the process is indexed by a singleton. The 
strong variance is defined as 


= E 

while the weak variance is dehned by 


sup5^/^(Xi) 
/ i 


a 


2 


supE 

/ 




A third quantity appearing in the moment and and tail bounds is 

M = sup|/(Ai)|. 
ij 

Before stating the main theorem, we first establish a connection between the 
weak and the strong variances of an empirical process: 


Lemma 7. 

< 0 -^ + 32Ve[M^E[Z] + 8E[m2]. 

If the functions in X are uniformly bounded, then E may be upper 
bounded by a quantity that depends on a and E[Z] thanks to the contraction 
principle (see [30]). Gine, Latala and Zinn [16] combine the contraction prin¬ 
ciple with a Hoffmann-Jprgensen-type inequality. To follow their reasoning, 
we need the following lemma. 


Lemma 8. Let Si,... ,£n denote independent Rademacher variables. Let A > 
4 and define to = a/A]E[M^]. Then 


E 


sup 

I 




sup/ \f{Xi)\>to 


< 


1 


(l-2/v/A)2 


E[m2 
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The proof of this lemma is postponed to Appendix A.3. 


Proof of Lemma 7. Let ei,...,£n denote independent Rademacher 
random variables, and let to = Then 


< E 


sup 

/ 




+ supE 
/ 




< + 2E 


< + 2E 


sup 

/ 


i 

(by the symmetrization inequalities [29], Lemma 6.3) 


sup 

/ 


+ 2E 


sup 

/ 


i 


<a^ + 4toE 


sup 

/ 


E^®/(^®) 


+ 2E 


sup 

/ 




<a+ 4toE 


(the contraction principle for Rademacher averages [29], Lemma 6.5 
since u t—>-/{2to) is contracting on [—toTo]) 

2 


sup 

/ 




+ 


<a^+ 8v'XK[M^]\\Z\\i + 


{1-2/Vxy 

-E[m2] 


E[M^ 


(by Lemma 8) 


(l-2/v/A)2 

which, by taking A = 16, completes the proof. □ 

The next theorem offers two upper bounds for the moments of suprema 
of centered empirical processes. The first inequality improves inequality (3) 
of [35]. The second inequality is a version of Proposition 3.1 of [16]. It follows 
from the first combined with Lemma 8. 

Theorem 12. Let T denote a countable class of measurable functions 
from A —> M. Let Xi,... ,Xn denote independent X-valued random variables 
such that for all f G if and i = 1,... ,n, E/(Aj) = 0. Let 


Z = sup 
/6.F 




2 = 1 
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Then for all q>2, 

\\{Z-nZ])+\\g<V^{T + a) + 2Kq(\\M\\,+ sup \\f{Xi)\\ 2 ), 

\ ijer ) 

and furthermore, 

\\Z\\q < 2EZ + 2ay/2K,q + 20«;(?||M||q + 4y^||M||2. 


Proof. The proof uses Theorem 2 which states that 
\\{Z-nZ])+\\q<V2Kq\\V+\\q/^. 

We may bound as follows: 

n 

< sup^E[(/(W) - f{X')f\X^] 

n 

<sup^(E[/(W)2] + /(W)2) 
n n 

< sup^E[/(Xi)^] + sup^/(W)^- 

Thus, by Minkowski’s inequality and the Cauchy-Schwarz inequality, 


VW^2< 


sup^E[/(X,)2] + 


< fj + 


= (T + 


sup 


SUp^/(Xi)2 


q/2 




/e^ \ i=i 


sup sup ^aif{Xi) 

/eJF a -. ||q:|| 2<1 i=l 


< fj + s + 


sup 

yf&T,a : ||o||2<l j=i 


-E 


sup ^aif{Xi) 
_/eJF,a: ||a||2<l j=i 


The last summand may be upper bounded again by Theorem 2. Indeed, the 
corresponding V~^ is not more than 


max sup f'^{Xi) + maxsupE[/^(Xj)], 

* f&r * f&r 
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and thus 


\ sup ^ai/(Xi) 

-E 

n 

sup '^aif{Xi) 

) 

\/eJE,a: ||a||2<l j=i 


_/e.E,a: ||o|| 2 <lj=i 

/ + 


< \f2^[ ||M||g +maxsup ||/(Xi )||2 ]. 

\ * f&r J 

This completes the proof of the first inequality of the theorem. The second 
inequality follows because by nonnegativity of Z, \\{Z — E[Z])_ ||g < EZ and 
therefore \\Z\\q < EZ + ||(Z — E[Z])+||g and since by the first inequality, 
combined with Lemma 7, we have 

||(Z-E[Z])+||g < ^/^(cr + \/32\/E[m 2]E[Z] + \/8E[M2] + a) 

+ 2Kq(\\M\\q + sup ||/(Xi)|| 2 ) 

\ ij&T / 

< E[Z] + 2aV2Kq + 16k'\/e[M^] + \/l6KqE[M^ 

+ 2Kq(\\M\\q+ sup ||/(Xi)|| 2 ) 

V L/6.F / 

(using the inequality Vab <a + b/A). 

Using ||M ||2 < ||M||q and supjjg^p ||/(Xj )||2 < ||M|| 2 , we obtain the desired 
result. □ 

8. Conditional Rademacher averages. Let be a countable class of mea¬ 
surable real-valued functions. The conditional Rademacher average is dehned 
by 


Z = E 

sup 


x^ 


./e.E 

i 



where the are i.i.d. Rademacher random variables. Conditional Rademacher 
averages play a distinguished role in probability in Banach spaces and in sta¬ 
tistical learning theory (see, e.g., [1, 2, 3, 22, 23]). When the set of functions 
is bounded, Z has been shown to satisfy a Bernstein-like inequality [7]. Here 
we provide bounds on the growth of moments in the general case. 

Theorem 13. Let Z denote a conditional Rademacher average and let 
M = supjj f{Xi). Then 

||(Z - E[Z])+||, < V2Kq\\M\\qE[Z] + Kq\\M\\q 

and 

||(Z - E[Z])_||, < v^{V(?||M||,E[Z] + 2q\\M\\q}. 
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Proof. Define 


Z, = E 

sup 


Xf 



j¥=i 

- 


The monotonicity of conditional Rademacher averages with respect to the 
sequence of summands is well known, as it was at the core of the early 
concentration inequalities used in the theory of probability in Banach spaces 
(see [29]). Thus, for alH, Z — Zi>0 and 

Y,{Z-Z,)<Z. 

i 

Thus, we have 

V < ZM and Z — Zi < M. 

The result now follows by Corollary 3, noticing that M = W. □ 

9. Moment inequalities for Rademacher chaos. Throughout this section, 
Xi,X 2 ,... ,Xn denote independent Rademacher random variables. Let In^d 
be the family of subsets of {1,... ,n} of size d {d < n). Let T denote a set 
of vectors indexed by In,d- 'd' is assumed to be a compact subset of 

In this section we investigate suprema of Rademacher chaos indexed by 
T of the form 


Z = sup 



t&T 

d 

Vie/ / 


For each 1 < A: < d, let Wk be defined as 
Wk = sup sup 

tGT : ||Q!(^)||2<l,/l<fc 

E E .«u,; 

(Note that is just a constant, and does not depend on the value of the 
Xi's.) The main result of this section is the following. 

Theorem 14. Let Z denote the supremum of a Rademacher chaos of 
order d and let Wi, ..., Wd be defined as above. Then for all reals q <2, 

d-l 

\\{Z - E[Z])+\\q < '^{4Kqy/^E[Wj] + 
i=i 

<Y,{4Kqy/^E[Wfi. 
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Before proving the theorem, we show how it can be used to obtain expo¬ 
nential bounds for the upper tail probabilities. In the special case of d = 2 
we recover an inequality proved by Talagrand [40]. 


Corollary 4. For all t>0, 


¥{Z >E[Z] + t} <2exp(/\ 


\ V2(iE[IUj] 


2/i' 


Proof. 


By Theorem 14, for any q, 


F{Z > E[Z] + t}< 


E[(Z-E[Z])+]'? 


The right-hand side is at most 2“'? if for all j = 1,... ,d, {4:Kqy^'^K[Wj] < 
t/{2d). Solving this for q yields the desired tail bound. □ 


Proof of Theorem 14. The proof is based on a simple repeated appli¬ 
cation of Theorem 2. First note that the case d = l follows from Theorem 10. 
Assume that d > 1. By Theorem 2, 

\\{Z -K[Z])+\U<V^\\Vv^\y. 

Now straightforward calculation shows that 

Vv+<V2Wi 

and therefore 

ll(Z - E[Z])+\\g < V^V2{E[Wi] + ||(lFi - E[lTi])+||,). 

To bound the second term on the right-hand side, we use, once again, The¬ 
orem 2. Denoting the random variable V~^ corresponding to Wi by , we 
see that Vy < 2 W 2 , so we get 

IKlTi -E[lTi])+||, <^/^^/2(E[lT2] + \\{W2-F[W2])+\\q). 

We repeat the same argument. For A: = 1,..., d — 1, let Vy denote the vari¬ 
able corresponding to Wk- Then 

vy — ^ ®^P 

i:( i: ( n i: ndi’)*!-.4 )uj 


= 21F|+i 
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Thus, using Theorem 2 for each Wk, A: < d — 1, we obtain the desired inequal¬ 
ity. 

□ 

Remark. Here we consider the special case d = 2. Let T denote a 
set of symmetric matrices with zero diagonal entries. The set T defines 
a Rademacher chaos of order 2 by 


Z = 2 sup 
ter 




Let y be defined as 


Y =sup sup 

ter a: ||a||2<l j=i 

and let B denote the supremum of the L 2 operator norms of matrices t €T. 
Theorem 14 implies the following moment bound for q>2: 

\\{Z - E[Z])+\\q < 4^E[y] + 4:V2V^qB. 

By Corollary 4, this moment bound implies the following exponential upper 
tail bound for Z: 

+2 


F{Z > E[Z] -|-1} < 2 exp ( — log(2) 




A 


t 


: • 


'64KE[y]2 l6^/2^/^BJ' 

This is equivalent to Theorem 17 in [7] and matches the upper tail bound 
stated in Theorem 1.2 in [40]. Note, however, that with the methods of this 
paper we do not recover the corresponding lower tail inequality given by 
Talagrand. 

We finish this section by pointing out that a version of Bonami’s inequality 
[5] for Rademacher chaos of order d may also be recovered using Theorem 
14. 


Corollary 5. 
Then 

(9.1) 


Let Z be a supremum of Rademacher chaos of order d. 


IZI 




— 1 

yJAKqd — 1 


Note that Bonami’s inequality states \\Z\\q <{q — 1)‘^/^||Z||2 so that the 
bound obtained by Theorem 14 has an extra factor of the order of in 
the constant. This loss in the constant seems to be an inevitable artifact 
of the tensorization at the basis of our arguments. On the other hand, the 
proof based on Theorem 2 is remarkably simple. 
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Sketch of proof of Corollary 5. By Theorem 14, it suffices to 
check that for all j, 1 < J < rf, 

E[Wj]<d^^^\\Z\\2. 


Letting Wo = Z, the property obviously holds for j = 0. Thus, it is enough 
to prove that for any k > 1, 

nwk]<\\Wk\\2<Vd\\Wk-i\\2. 

To this end, it suffices to notice that, on the one hand, 




sup sup 









hv^A: ■ {h ; ■ ■—1 k h 1 / 

(the cumbersome but pedestrian proof of this identity is omitted), and on 
the other hand, 



l|w^fc-illi = E 


sup sup 

i; 

J,J'€In,d-(k-l) kjeJ / VjeJ' 

/ /k-1 


T. 


{U vi^A;_l}Ur7 


^ /k-l 


E ( 


}UJ' 


^ H ■ {h V 4fc — 1 }Ot/ ^^n,d 1 


Noticing that the contraction principle for Rademacher sums (see [29], The¬ 
orem 4.4) extends to Rademacher chaos in a straightforward way, and using 
the fact that | Jn J'| <d, we get the desired result. □ 
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10. Boolean polynomials. The suprema of Rademacher chaos discussed 
in the previous section may be considered as special cases of suprema of 
?7-processes. In this section we consider another family of [/-processes, de¬ 
fined by bounded-degree polynomials of independent {0, l}-valued random 
variables. An important special case is the thoroughly studied problem of 
the number of occurrences of small subgraphs in a random graph. 

In this section Xi ,..., Xn denote independent {0, l}-valued random vari¬ 
ables. Just like in the previous section, Tn^d denotes the set of subsets of size 
d of {l,...,n} and T denotes a compact set of nonnegative vectors from 

M^d). Note that in many applications of interest, for example, in subgraph¬ 
counting problems, T is reduced to a single vector. 

The random variable Z is defined as 


Z = sup 
ter 


E 


nv. 


ti. 


For the case d = 1, moment bounds for Z follow from Theorem 11. For 
k = 0,1,... ,d — 1, let Mk be defined as 


max sup 

J^^n,d — k 


E {n 


t]. 


Note that all are again suprema of nonnegative boolean polynomials, 
but the degree of Mk is k <d. 

Lower tails for Boolean polynomials are by now well understood thanks 
to the Janson-Suen inequalities [17, 38]. On the other hand, upper tails for 
such simple polynomials are notoriously more difficult; see [20] for a survey. 
We obtain the following general result. 


Theorem 15. Let Z and Mk he defined as above. For all reals q>2, 

\\{z-nz])+\U 

Proof. The proof is based on a repeated application of Corollary 3, 
very much in the spirit of the proof of Theorem 14. For each i € {1,... ,n}, 
define 

Zi = sup ( n 

Vie/ 

The nonnegativity assumption for the vectors t € T implies that Zi < Z. 
Moreover, 


Z — Zi< Md-i 
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and 

J2iZ - Zi) <dZ. 

i 

Thus, V < dMd-iZ. Hence, by Corollary 3, 

\\{Z - E[Z])+||q < y/2Kqd\\Md-i\\q^[Z] + Kdq\\Md-i\\q. 

We may repeat the same reasoning to each k = d — 1,... ,1, to obtain 

By induction on A:, we get 

\\Md-i\\q < 2| 

which completes the proof. □ 


Remark. Just like in the case of Rademacher chaos, we may easily 
derive an exponential upper tail bound. By a similar argument to Corollary 
4, we get 


P{Z>E[Z] +f} 

log 2 


< exp — - 


dn 


A 


( _^_ 

V4dVlE[^]E[Mj 


2/i 


A 


4dE[Md_j 


i/j' 


Remark. If d = 1, Theorem 15 provides Bernstein-like bounds, and is, 
in a sense, optimal. For higher order, naive applications of Theorem 15 may 
not lead to optimal results. Moment growth may actually depend on the 
special structure of T. Consider the prototypical triangle counting problem 
(see [18] for a general introduction to the subgraph counting problem). 

In the 0{n,p) model, a random graph of n vertices is generated in the 
following way: for each pair {u,v} of vertices, an edge is inserted between u 
and V with probability p. Edge insertions are independent. Let Xu,v denote 
the Bernoulli random variable that is equal to 1 if and only if there is an 
edge between u and v. Three vertices u,v and w form a triangle if = 
= Xw,z = 1. In the triangle counting problem, we are interested in the 
number of triangles 


Z = 


{u,V,w}eIn,3 




Note that for this particular problem, 

AJl = sup ^ ] Xu^wXd^w 
{u,v}&In 2 y, ■ u:^{u,v} 
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Ml is thus the maximum of ( 2 ) (correlated) binomial random variables with 
parameters n — 2 and p^. Applying Corollary 1 (with A = 2), we get for 
9 > 1 , 


||Mi||,<nA(E[Mi]+g-l). 
Simple computations reveal that 

E[Mi] < 2(logn + np^). 


Applying Corollary 3 to Z, we hnally get 


II {Z - nz])+\\g < V6KqE[Mi]E[Z] 

+ q{\j6KK[Z] + 3«;(n A (E[Mi] + 3{q - 1)))), 


which represents an improvement over what we would get from Theorem 15, 
and provides exponential bounds with the same flavor as those announced 
in [7]. However, the above inequality is still not optimal. In the following 
discussion we focus on upper bounds on ¥{Z > 2E[Z]} when p > logn/n. 

The inequality above, taking q = or q = [ ^ 2 ^ J ’ i™plies that for 

sufficiently large n. 


P{Z > 2E[Z]} < exp 


4 n?p^ 

‘°®3l4S 


V log 2 



Recent work by Kim and Vu [21] show that better, and in a sense optimal, 
upper bounds can be obtained with some more work; see also [19] for related 
recent results. Kim and Vu use two ingredients in their analysis. In a first 
step, they tailor Bernstein’s inequality for adequately stopped martingales 
to the triangle counting problem. This is not enough since it provides bounds 
comparable to the above inequality. In the martingale setting, this apparent 
methodological weakness is due to the fact that the quadratic variation 
process {Z) associated with Z may suffer from giant jumps [larger than 
0(n^p^)] with a probability that is larger than exp(—0(n^p^)). In the setting 
advocated here, huge jumps in the quadratic variation process are reflected 
in huge values for Mi. [In fact, the probability that Mi > np is larger than 
the probability that a single binomial random variable with parameters n 
and p'^ is larger than np which is larger than exp(—0(np)).] In order to get 
the right upper bound, Kim and Vu suggest a partitioning device. An edge 
(u,u) is said to be good if it belongs to less than np triangles. A triangle 
is good if its three edges are good. Let Z^ and Z^ denote the number of 
good and bad triangles. In order to bound the probability that Z is larger 
than 2E[Z], it suffices to bound the probability that Z^ > 3/2E[Z] and that 
Z^ > E[Z]/2. Convenient moment bounds for Z^ can be obtained easily using 
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the main theorems of this paper. Indeed /np satisfies the conditions of 
Corollary 1 with A = 3. Hence, 


\\{Z^ 


nz^])+\u 


< \/k 


V3qnpE[Z^] + 


2>qnp' 


This moment bound implies that 

P|Z5 > ^E[Z]| < exp(^-log^^). 


We refer the reader to ([21], Section 4.2) for a proof that ¥{Z^ > E[Z]/2} is 
upper bounded by exp(—0(n^p^)). 

The message of this remark is that (infamous) upper tail bounds concern¬ 
ing multilinear Boolean polynomials that can be obtained using Bernstein 
inequalities for stopped martingales can be recovered using the moment in¬ 
equalities stated in the present paper. However, to obtain optimal bounds, 
subtle ad hoc reasoning still cannot be avoided. 


APPENDIX 

A.l. Modified 0-Sobolev inequalities. Recall the notation used in Sec¬ 
tion 3. As pointed out in [25], provided that 4>" is strictly positive, the 
condition l/4>" concave is necessary for the tensorization property to hold. 
Here we point out the stronger property that the concavity of l/(p" is a nec¬ 
essary condition for the (/)-entropy Hfj, to be convex on the set L+ (Q,A., P) 
of bounded and nonnegative random variables. 


Proposition A.l. Let cj) he a strictly convex function on M_|_ which is 
twice differentiable on M^. Let (D,A., P) he a rich enough probability space 
in the sense that P maps A onto [0,1]. If H^j) is convex on Lj]q(Q,a1,P), then 
4>"{x) > 0 for every x > 0 and l/cf" is concave on M^. 

Proof. Let 6 e [0,1] and x, x', y, y' be positive real numbers. Under the 
assumption on the probability space we can define a pair of random variables 
{X,Y) to be {x,y) with probability 6 and {x',y') with probability (I — 9). 
Then the convexity of means that 

H^iXX + (I - X)Y) < XH^{X) + (I - X)H^{Y) 
for every A e (0,1). Defining, for every (m,u) G x 

Fx{u, v) = -4>{Xu + (I - A)u) -I- A(/)(u) + (I - A)())(u), 
the inequality is equivalent to 

Fx{e{x,y) + (I - 0)(x',y')) < 0Fxix,y) + (1 - e)Fx{x',y'). 
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Hence, Fx is convex on x R^. This implies, in particular, that the de¬ 
terminant of the Hessian matrix of Fx is nonnegative at each point {x,y). 
Thus, setting xa = Ax + (1 — A)y, 

[(/>"(x) - X^"{xx)][(l)"{y) - (1 - X)(p"{xx)] > A(1 - A)[(/>"(xA)]^ 

which means that 

(A.l) > A0"(2 /)<^"(xa) + (1 - A)<^"(x)</."(xa). 

If (j)”{x) = 0 for some point x, we see that either (p"{y) = 0 for every y, which 
is impossible because (p is assumed to be strictly convex, or there exists some 
y such that (j)''{y) > 0 and then cf)" is identically equal to 0 on the nonempty 
open interval with endpoints x and y, which also leads to a contradiction 
with the assumption that (p is strictly convex. Hence cp" is strictly positive 
at each point of R!^ and (A.l) leads to 

_1_ A 1-A 

p"{Xx + (1 - X)y) ~ p"{x) ^ p''{y) ’ 

which means that 1/p" is concave. □ 


Proof of Proposition 2. Without loss of generality we may assume 
that p{0) = 0. If (/> is strictly convex. 


1 

P"{{1 — X)u + Ax) 
1-A A 
“ pf'{u) p"{x) 
X 

~ p"{x) 


(by concavity of 1/p") 


(by positivity of p", i.e., strict convexity of p). 


In any case, the concavity of 1/p" implies that for every A G (0,1) and every 
positive X and u, 


Xp"{{l — X)u + Ax) < p"{x), 


which implies that for every positive t, 

Xp"{t + Ax) < p"{x). 

Letting A tend to 1, we derive from the above inequality that p" is nonin¬ 
creasing, that is, p' is concave. Setting p{x) = p{x)/x, one has 

x^p"{x) = x^p"{x) — 2xp'{x) + 2p{x) = /(x). 

The convexity of p and its continuity at 0 imply that xp'{x) tends to 0 as 
X goes to 0. Also, the concavity of p' implies that 

x^p"{x) < 2x{p'{x) — p'{x/2)), 
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so x‘^4)"{x) tends to 0 as x —> 0 and therefore f{x) —> 0 as x ^ 0. Denoting 
(abusively) by the right derivative of cp" (which is well defined since 1 / cp” 
is concave) and by f the right derivative of /, we have f'{x) =x‘^(p^^\x). 
Then f'{x) is nonpositive because cp” is nonincreasing. Thus, / is nonin¬ 
creasing. Since / tends to 0 at 0, this means that / is a nonpositive function 
and the same property holds for the function ip", which completes the proof 
of the concavity oi ip. □ 

Proof of Lemma 2. Without loss of generality we assume that ())(0) = 
0. The convexity of (p implies that for every positive u, 

-cPinZ]) < -(Pin) - (]E[Z] - u)cP'{u), 

and therefore 

H^Z) < E[(P{Z) - cPiu) -{Z- u)cP'{u)]. 

Since the latter inequality becomes an equality when u = m, the variational 
formula (3.7) is proven. Since Z’ is an independent copy of Z, we derive 
from (3.7) that 

H^{Z) < E[cP{Z) - cP{Z') -{Z- Z')^\Z')\ 

< -E[{Z - Z')(p'{Z')] 

and by symmetry 

2H^{Z) < -E[{Z' - Z)P)'{Z)] - E[{Z - Z')cP'{Z% 

which leads to (3.8). To prove (3.9), we simply note that 

lE[(Z - Z'){^P{Z) - V^(Z'))] - H^iZ) = -E[Z]E[^(Z)] + cP{E[Z]). 

But the concavity of ^p implies that E['0(Z)] < '0(E[Z]) = i;/)(E[Z])/E[Z] and 
we derive from the preceding identity that (3.9) holds. □ 

Proof of Theorem 6. Fix first y <x<y. Under the assumption that 
g = (p' o f convex, 

</>(/(y)) - - ifiy) - f{x))(p'{f{x)) 

^ ^ <k{y-xff‘^{y)cp"{f{y)). 

Indeed, denoting by h the function 

h{t) = (t>U{y)) - 4 >U{'t)) - ifiy) - 


we have 


h'{t) = -g{t){f{y)-f{t)). 
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But for every t <y, the monotonicity and convexity assumptions on / and 
9 yield 

0<-g'{t) <-g'{y) and 0 <-{f{y) - f{t)) <-{y - t)f'{y), 


hence 


-h'{t) < {y-t)f{y)g'{y). 

Integrating this inequality with respect to t on [x^y] leads to (A.2). Under 
the assumption that ip o f is convex, we notice that 

0 < -(/(y) - fix)) < -{y - x)f'iy) 

and 

0 < -i'lpifiy)) - fj{fix))) < -{y - x)fiy)'ip'ifiy)), 
which implies 

(A.3) ifiy) - f{x))i'ipifiy)) - ipifix))) < (x - y)‘^ iy)ip'if {y))■ 

The tensorization inequality combined with (3.7) and (A.2) leads to 

n 

H^fiZ)) <^J2mZ-Z,)^f\Z)iP"ifiZ))] 

i=l 

and therefore to the hrst inequality of Theorem 6, while we derive from the 
tensorization inequality (3.9) and (A.3) that 

n 

H^ifiZ)) <Y.mZ - Zl)lf'\Z)tP\fiZ))], 

i=l 

which means that the second inequality of Theorem 6 indeed holds. 

In order to prove the third inequality, we simply dehne fix) = /(—x) 
and Z = —Z. Then / is nondecreasing and convex and we can use the hrst 
inequality of Theorem 5 to bound H^ifiZ)) = H^ifiZ)), which gives 

n 

H^ifiZ)) <Y.mZ - Z[)lf'\Z)iP'ifiZ))] 

i=l 

n 

<Y,niz - zi)if\z)ip\fiz))], 

i=l 


completing the proof of the result. □ 
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A.2. Proof of Lemma 6. By Stirling’s formula, 


A;! = ^ \/27r, 


where l3k is positive and decreases to 0 as /c —> oo. Using the above formula 
with k = q — 2, k = q — 1 and k = q leads to 


Xq < 


-/3,-i-l/2 {1JZ1\ , 1 A-/3.-2-1 ( 






\q{q-2)) ■ 


By the monotonicity of Stirling’s correction, we have fSq < Pq-i < /3g-2) and 
the preceding inequality becomes 

1 g-l 


Xq<e 


q 


-) 




{q-2)q) 


Our aim is to prove that < 1. Let 


-.-1/2 


dn - 6 




u„ = 


e-\q-lY/\q-2)-^l^q-^l^ 


1 - On 


Then 


Xq ^ CLq UqiX U.) 

A 


and since Ug ^ K as g —> oo, in order to show that Xg < 1, it is enough 
to prove that Ug < Ug+i for every g > 4. Let 6 = Ijq. Then Ug < Ug+i is 
equivalent to g{9) > 0, where 


(A.4) 


g{e) = (1 - 20)^/^(l - 


-(1-0)1/2(1-02)1/4, 

Now, t —> t-2((l — t)i/2 — (1 — 2t)i/^) is easily seen to be increasing on (0,1/2) 
(just notice that its power series expansion has nonnegative coefficients), so 
since 0 < 1/4, setting 7 = 16(v'3/4 —2“i/‘i), one has (1 —20)i/'i > (1 —0)^/2 — 
702 . Plugging this inequality in (A.4) yields 

g{9) > (1 - 0)i/2(l - (1 - 02)1/4) _ _ e-i/2(l - 0)1/4) 

and therefore, using again that 0 < 1/4, 

g{9) > {lf\l - (1 - 0')'/") - 70^(1 - 

Finally, note that 1 — (1 — 0^)i/4 > 0^/4 which implies 
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and since one can check numerically that the right-hand side of this inequal¬ 
ity is positive (more precisely, it is larger than 0.041), we derive that the 
sequence (uq) is increasing and is therefore smaller than its limit K and the 
result follows. 


A.3. Proof of Lemma 8. The statement follows from a version of Hoffmann- 
Jprgensen’s inequality. In particular, we use inequality (1.2.5s) on page 10 
in [13] with p = l and t = 0. Then we obtain 


E 


sup 

/ 


i^i)'^supf\f{Xi)\>to 

i 


< 


E[M2]V^ 

I-{AF[supf\Y.i£iP{Xi)\ 


2 


The right-hand side may be bounded further by observing that, by Markov’s 
inequality. 


sup 

/ 


i 


>0 

— 

sup|/(Xi)| > to 



- f,i 


^ E[M2] _ 1 
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